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OPERATING FREQUENCY REDUCTION FOR TRANSVERSAL FIR 

FILTER 



Field of the Invention 

5 

The present inventipn relates generally to computer network conunimications and 
more particularly to methods and systems that allow analog transversal FIR filters 
to operate at ultra high frequencies. More particularly, the present invention relates 
to a method and a system that allows the use of double-edge clocking to reduce the 
10 frequency of operation of a transversal FIR filter whose general fimctionality can 
be used to implement a Feed Forward Equalizer (FFE) and a Decision Feedback 
Equalizer (DFE). The invention is particularly relevant for systems that operate at 
lOGb/s or above, where the reduction in operating frequency of a sub-block will 
result in reduced power consumption. 

15 

Background of the Invention 

Description of Related Art 

20 A standard transversal FIR filter includes a set of latches, a set of respective 
multipUcation elements, and a summing node. The order of the filter defines that 
number of latches contained in the data FIFO, where each latch output can be used 
to drive a co-efficient value/gain required for the FIR filter fimctionality to be 
realized. 

25 

The latches of the transversal filter all operate using the same clock, referred to as 
the High Speed Clock, which has a period T which is equal to the Unit Interval 
(UI) of the serial data stream. la practical applications, the delay element is 
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implemented using a Flip-Flop that samples data present on an input on a given 
clock edge, and holds the data value on . an output for the duration of a clock 
period. 

5 In certain cases, it is advantageous to reduce the operating frequency of the clock, 
signal for reasons of technical feasibility or power consumption optimization. It is 
possible to split the delay elements in the transversal filter into two groups, one of 
which latches data on the rising edge of the clock signal, the other latching data on 
the falling edge of the clock signal. This will allow a High Speed clock signal with 
10 a period T which is effectively twice the duration of a UI, it also impUes that the 
data sample is held by the delay element for two UI. 

In order to improve Bit Error Rate performance in communications systems, a 
transversal FIR filter is sometimes used in the receiver or the transnoitter to correct 
15 for InterSymbol Interference (ISI). An FFE is contmionly used in a transmitter, 
while a receiver will generally contain a DFE. 

An FFE is an extension of a standard serializer transmit block, where data bits are 
shifted through delay elements to be transmitted one at a time, but with a partial 

20 contribution firom other bits contained in the delay structure. An FFE serial 
transmitter includes a set of delay elements, a set of multipliers, and a summing 
node. The delay elements all operate using the same High Speed clock signal, and 
shift data forward on only one edge (usually rising) of the clock. An FFE requires 
that the output of a delay element be held for no more than one UI. Thus, the 

25 period T of the High Speed clock is generally equal to one UI for proper 
functionality. 
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A DFE receiver block is an extension of a standard serial bit receiver block, A 
DFE receiver block includes a slicer, a set of delay elements, a set of multipliers, 
and a summing node. The slicer and delay elements all operate using the same 
High Speed clock signal, and sample data on only one edge (usually rising) of the 
5 . clock. A DFE requires that the data sample be held at the output of a delay element 
for not more than one UI. Thus, dn order for a DFE receiver block to function 
correctly, the period T of the High Speed clock signal must be equal to the Unit 
Interval of tihe incoming data stream. 

10 • In both the case of the FFE and DFE, increasing the High Speed clock period by a 
factor of two would cause a functional failure. Therefore, there is a need to have an 
efficient method and system that will allow a DFE to function using a double edge 
clocking scheme, so that the frequency of operation of the transversal filter in an 
. FFE or DFE can be reduced. 

15 

SUIMDMUOIY OF INVENTION 

^ The present invention is a method and system for reducing the frequency of 
operation for a transversal Finite Impulse Response (FIR) filter. The transversal 
20' filter operates in such a way that it has an even and odd row of data, which are 
' • - latched on rising and falling edges of the clock respectively. This allows the clock 
*' frequency to be reduced by a factor of 2, stnd thus allows the use of more power 
efiScient latches. Reducing the frequency of operation causes the high speed 
latches withia the transversal filter to hold the data bits twice as long as is 
25 required, and thus a circuit is required to select the appropriate data bits from 
output of the appropriate half-speed latch, and subsequently scale it to apply the 
co-efficient gain. Each of the subsystems is analog, and operates in accordance 
with a synchronous clock system 
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In one particular emodiment, the present invention may be characterized as a 
method and system to allow a transversal filter to operate at a reduced firequency 
while maintaining the Finite Impulse Response that is required by the application. 

5 This is achieved by providing latches that operate at a slower sub-multiple of the 
high speed clock and multiplexing the output of the slower latches in such a way 
that the co-efiBcient multipliers are driven by the correct data, and for the correct 
duration. Advantageously, the reductions in frequency for the transversal filter 
result in a high-speed 'circuit that may have considerably lower power consumption 

10- than one that operates at full speed. Additionally, the multiplexing circuit may 
directly apply the discreet gain required to create a coefficient, which by. 
constraction will further reduce complexity, die area, and power consumption. 

BRIEF DESCRIPTION OF THE DRAWINGS 

15 

FIG. 1 a simplified block diagram of a point-to-point backplane conmnunication; 
FIG. 2 a simplified block diagram of a functional architecture and intemal 
. constractions of an exemplary 1 OGb/s SerDes which is outlined in FIG. 1 ; 
FIG. 3a a simpUfied block diagram of a transversal FIR filter using single edge 
20 clocking; 

FIG. 3b a simplified block diagram of an embodiment of a. shift register matrix 
331 constructed in accordance with the principles of the invention; 
FIG. 4 a simplified block diagram of the present invention; 
FIG. 5 a simplified block diagram of a transversal FIR filter stage that 
25 incorporates the present invention; 

FIG. 6 a simplified timing diagram for the first stage of a transversal filter using 
the present invention; 
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FIG. 7 a simplified timing diagram for a generic stage of a transversal filter that 
incorporates the present invention; 

FIG. 8 a simplified block diagram of an exemplary DFE that incorporates the 
present invention; 

5 itG. 9 a simplified block diagram of an exemplary FFE that incorporates the 
present invention; 

DETAILED DESCRIPTION OF THE BWENTION 

The present invention provides a method and a system for. using a double-edge 
10 clocking scheme and reducing the fi*equency of operation for a transversal FIR 
filter. The invention comprises of a set of 2:1 multiplexers, whose output 
amplitude can be controlled such that it is possible to apply a gain to the selected 
input signal. The invention is used in combination with a transversal FIR filter that 
operates at one half the intended data rate. The transversal filter is comprised of 2 
15 separate sets of analog latches, where one set is positive edge active and the other 
• set is negative edge active. 

The present invention can be used to implement very high-speed transversal FIR 
. filters where the frequency requirements may be at the very limit of some. 
20 ' mainstream CMOS technologies and geometries. Since the frequency of operation 
■ for the latches may be reduced by a factor of 2, the overall current consumption 
can also be reduced, and thus an overall reduction in power consimiption can be . 
realized through the use of the present invention. 

. 25 In order to appreciate the advantages of the present invention, it will be beneficial 
: to describe the invention in the context of an exemplary lOGb/s 
Serializer/Deserializer (SerDes). The particular implementation chosen is depicted 
in FIG. 1, which is a simplified block diagram of a single pair communication 
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system operating in half duplex niove over 2 pair dijBferential copper backplane 
traces. 

In FIG. 1 the conununication system is represented as a point-to-point system in 
5 order to simplify the explanation, and includes two main SerDes blocks 100 and 
102, coupled together via two pairs of differential high-speed copper traces 128a 
and 128b. Each transceiver block 100 and 102 is capable of operating at a baud 
rate exceeding lOGb/s in each direction. Each transceiver 100 and 102 has a high- 
speed aiaalog interface 110 and a low-speed digital subsection 108. A phase 
10 matching handofif 106 guarantees timing between the digital interface and analog 
interface. 

In the case of data transmission, the TX parallel data is encoded in the PCS block 
104 and is fed into the MUX 116 at an appropriately scaled lower frequency clock. 
1 5 Timing between the digital and analog interfaces is guaranteed by the handoff 106. 
Encoded data is multiplexed from a parallel format into a high-speed serial format 
at data rates exceeding lOGb/s by the MUX block 116. Transmit data is equalized 
by the TX_EQ 118 and subsequently transmitted into the channel 128 by an 
impedance matched analog TX_IO block 120. 

20 

In the case of data reception, RX JO 126 provides an impedance-matched buffer 
between the channel 128 and the inptit of the RX_EQ 124. RX_EQ corrects the 
attenuation and jitter introduced by the channel. A serial bit stream representing 
latch decisions of the equalized data is then fed into the DEMUX 122, which in 
25 turn will de-serialize the data, into a parallel output word. The parallel data output 
of the DEMUX 122 feeds the PCS block 104 through the handofif 106. The 
handofif also serves to retime the data and filter jitter introduced by flie channel 
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128b. The PCS block decodes the parallel data stream to reproduce the original 
transmitted parallel data stream. 

FIG. 2. is a simplified block diagram of the functional architecture and internal 
5 constructions of an exemplary lOGb/s SerDes 200, such as that was described in 
PlG.l. The illustrative lOGb/s SerDes, which includes the Receive QSOC) and 
Transmit (TX) paths 202 and 230 respectively, will be referred to as the "SerDes". 

The SerDes RX path includes an incoming serial data stream 204, RXJO 126, 
10 High Pass Filter (HPF) 206, Summing Node 208, Decision Feedback Equalizer 
(DFE) 216, Receive PLL (RXPLL) 212, DEMUX 122, RX Handoff 224, and the 
RX PCS 226. The HPF 206 serves to pre-shape the spectral content of the signal in 
such a way that data bit transitions are accentuated, which serves to partially 
equalize the incoming data stream so that the RXPLL 212 can use it for clock 
15 recovery. The recovered clock signal on 228 is phase and firequency correlated 
with the incoming serial data stream 204 and is used by the DFE 216, DEMUX 
122, and the Handoff 224. The Handoff 224 serves to absorb low firequency jitter 
and guarantees that the RXJPCS .226 receives the data without any timing 
violations. 

20 

Based on the signal quality criteria determined by specialized analog circuitry 21 0, 
an adaptation algorithm 214 drives the co-efficient settings of the analog DFE 216. 
• Since the DFE is based on a feedback mechanism, the perfectly equalized data 
stream is formed at the Summing node 208, where the feedback response of the 
25 DFE 216 and the feedforward' response of the;HPF 206 are linearly added to form 
the totally equalized data stream. The DFE contains a transversal filter, which is 
essentially a shift register. The output data stream of the DFE 222 represents 
logical decisions made from the equalized data stream 208. The depth of the DFE 
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216 governs the latency of the data stream with respect to the input of the channel. 
The DEMUX 122 contains multiple stages of 2:1 demultiplexers, which will serve 
to generate a lower speed parallel data bus 220 that will be processed by the 
RX^PCS. 
5 • 

The SerDes TX path includes the.TX^PCS 232, TX Handoff 234, MUX 116, TX 
Equalizer (TX_EQ) 118, TX_PLL 246, TXJO 120, and output data stream 250. 
Using a reference clock 248, the TX_PLL 246 generates jitter free high-speed 
clocks that wiU drive the TX Handoff 234, MUX 116, and TX^EQ 118, The MUX 
• 10 contains a series of 2:1 multiplexers and is designed to transforai a parallel data 
stream 238 into a high-speed serial bit stream 242. A TX_EQ 118 uses the 
outgoing serial bit stream 242 to generate the pre-shaped bit stream 250 that is 
launched into the data channel via the output lO buffer 120; Transmit equalization 
is often used to complement or enhance receiver-based equalization because of its 
1 5 ease of implementation and strai^tforward operation. 

The TXj^EQ 118 and DFE 216 are both discrete time equalizers that require a 
clock. The niinimiim required frequency of operation of the filters is the data rate 
frequency. Therefore, a lOGb/s data stream would require the equalizers to be 
20 clocked with a lOGhz clock, assuming the circuit is active on a single rising or 
falling edge of said clock. Equivalently, the period T of the clock would need to be 
■ the same as the duration of a single data bit, where this duration is generally 
referred to as a Unit Interval (UI). For lOGb/s operation, the UI is lOOps. 

25 FIG. 3a shows a simplified block diagram of a transversal FIR filter using single 
edge clocking. A transversal FIR filter 300 includes shift register 302, which has 
delay elements 304 serially connected to the data signal on line 308. The delay 
elements are switched by a clock signal on line 306 at a switching rate that is equal 
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to the bit rate. The delay x is equal to the bit-rate. This interval can also be 
expressed as the period T of the clock signal appearing on line 306. The output of 
each delay element 304 can then be used to drive a coefficient value 310 to 
generate the FIR filter response 314, In the context of a lOGb/s SerDes, generation 
5 of a lOGhz clock and design of logic that can operate correctly within lOOps is 
very challenging in contemporary CMOS technology, and there would be great . 
advantages in performance and power consumption if the circuitry could operate at 
a lower frequency. 

Power consumption and design complexity of the FIR filter may be reduced if the 
•10 frequency of operation of the filter is reduced by a factor Q. If the clock frequency 
of the filter 300 is reduced by a factor Q where 

Q = 2P, wbere p : {0,1,2,3!..}, eq.l 

15 And where ^ is an integer. 

FIG. 3b is a simplified block diagram of an embodiment of a shift register matrix 
331 constructed in accordance with the principles of the invention. The shift 
register matrix has a matrix of delay elements 334 the arrangement and operation 

20 of which is as follows. The data signal on line 330 is provided to Q first delay 

elements 334 arranged as the Q rows of delay elements 334 in the first or left most 
column of the matrix of delay elements, hi this arrangement, the delay of each 
delay element is Q x x, where, again, x is equal to the bit-rate. This interval can 
also be expressed as a multiple Q of the period T of the data clock signal of the 

25 data signal appearing on line 330. Thus the delay elements 334 operate at a 

reduced frequency from those in the configuration of Figure 3a for a data rate on 
lines 308 and 330 that is equal. The clock signal on line 332 has a period Q x T of . 
the reference data rate or data clock signal. Expressed another way, the clock 
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signal on line 332 has a frequency that is a Q sub-mxxltiple of the data clock signal, 
thus the frequency of the clock on line 332 is 1/Q of the frequency of the reference 
data rate. 

5 Thus in tiiie implementation of Figure 3b, reduced power consumption is achieved 
. as the frequency of operation of the filter is reduced by a factor Q from the 
reference data rate. 

The shift register matrix of Figure 3b is configured to shift data correctly without 
10 dropping any bits. This is achieved by using latches 334 that operate on Q equally 
spaced phases of the clock signal appearing on line 332, such that the data 330 is 
captured every UI. Phase delay blocks 336, whose phase delay contribution is 
defined by the relationship £q.2, generate the clock signaling appearing on lines 
338. 

15 

360 

^„ =nx-^,wheren:{0,l...Q-l} - eq.2 

The delay of the latches 334 is increased by a factor of Q, so that the overall 
timing of the system is preserved. This system has the advantage of operating 

20 using a latch 334 that is Q times slower than the latch 304 m BIG. 3a. However, 
where a shift register matrix 331 is used to implement an FIR filter, the response 
of the filter would be incorrect regardless of the coefficient settings. The change in 
the duration of the latched data signal passing along line 340 will prevent a filter 
constructed using the shift register matrix of Figure 3b from generating a response 

25 at the frequency of interest. An additional circuit is reqxiired to allow the shift 
register matrix 331 to be used as part of a transversal FIR filter. 
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FIG. 4 is a detailed block level diagram of a multiplexer multiplier (mux/mul) 400 
constructed in accordance with the principles of the invention. A multiplex^: 410 
with Q inputs 414 has an input select control 412 that selects an ou^ut to summer 
408 firom one of the inputs 414. When the mux/mid 400 is configured with a shift 
5 register matrix 331 of Figure 3b, the selected input line of the Q inputs 414 is 
multiplexed at the same firequency or clock rate as the serial data signal appearing 
on line 330. That is, the multiplexer 410 is required to continuously select from Q 
inputs 414 for duration of lUI. The output of the Multiplexer 410 accommodates a 
scaling factor related to a gain 404 and a polarity according to a sign 402. This 
•10* . . combination of functionaUty is referred to as a multiplexer-multipUer (Mux-Mul) 
400. 

. Modifying the selected input value with respect to polarity and gain produces the 
same effect as a coefficient multiplier. If the maximum signal swing is normalized 
1 5 with the desired signal amplitude, then it is possible to scale the output of the mux- 
mul as a function of control inputs 404 and 402. By construction, this mechanism 
can be related directly to a coefficient used in an FIR filter. The coefficient value 
is given by: 

20 C(w) = i)[«]xC?[jc:0]xsgn(G) eq,3 

where C(n) 406 represents the applied response related to the nth coefficient of a 
given transversal filter, D[n] represents any one input decision data 414, G[x:o] 
represents the normalized ma^tude 404 of the gain associated with the 
25 coefficient, and sgn(G) represents the sign 402 of the gain which is appUed. It 
should be noted that only the input select 412 is changing at the same rate as flie. 
input data stream, and this allows the Mux-Mul to simulate the effect of a baud- 
spaced transversal filter for each coefficient. 
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Q Mux-Mul's are required for every stage of a transversal filter, which means that 
every stage of the transversal iBlter will represent Q number of co-efficients. 
Further references to the Mux-Mul structure and transversal filters will be based 
5 on 0=2, even though it is understood that Q can be any power of two as per Eq- 1. 
When 0=2, the phase delay 336 is 180 degrees, which corresponds to the 
complement of the filter clock. In the exemplary lOGb/s SerDes, a single clock 
phase is used throughout the circuit to fiirther decrease complexity. -Decision logic- 
is designed to be active on the falling edge of the clock in order to implement the 
10 section of the transversal filter operating on the phase delayed clock 338. 

FIG. S is a detailed block diagram of the first stage 500 of a transversal filter that 
uses the present invention 400 in the context of the exemplary lOGb/s SerDes. 
There is a lOGb/s serial input data stream 502, SGhz clock input 504, coefficient 

IS control signals 402 and 404, coeffici^t ou^uts 501 and 503, and latched data 
outputs 506 and 508 that are the even and odd decisions of the latches 514 and 
516. Rising-edge active latch 514 and falling-edge active latch 516 have a delay 
time of 200ps. The Mux-Mul's 412 have two data inputs 510 and 512, and an 
input select 412. Since 0^=2, there are two coefficients 501 and 503 that are 

20 generated by the stage. The even and odd decision data 506 and 508 are held for 
200ps and will be used by the next stage in the filter. The SGhz clock 502 is used 
as a logical input select 412 for each MuxrMul, where one input 510 or 512 is 
selected in altemating fashion for lUI. 

25 In the case where analog latches are used to build the transversal filter, as opposed 
to true digital Flip-Flops, the outputs of the even latches are connected to the 
inputs of the odd latches and vice versa. This has to do with the fact that latches, 
unlike Flip-Flops, have a tracking stage that starts when the clock is low, and a 
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regeneration stage that starts when the clock is high. If a series of identical latches 
were cascaded to form a shift register, all the latches would enter either tracking or 
regeneration mode at the same time, which would cause the circuit to fail. By 
alternating active high and active low latches in series, a following latch will track 
5 what the previous latch has regenerated, and thus the data can properly move 
through the shift structure. The overall fimctionality of the transversal jSlter is not 
changed, and this point is specifically related to the implementation of the circuit, 
not the principal of this present invention. FIG.Sa and FIG. 8b illustrate this 
interconnect schocne clearly. 

10 

FIG. 6 is a timing diagram 600 related to the first slice of the FIR filter. The 
incoming data signal 504 is a serial bit stream, with a data period of lUI. The 
clock signal 602 has a period of 2UI. The decision data signal 614 for the even 
latch 514 has a UI advance on the decision data signal 616 of the odd latch 516. 

15 The lowercase notation bX denotes the incomiiig serial data stream with duration 
lUI, and the uppercase notation BX denotes the corresponding decision data with 
duration 2UI. The overall effect of the present invention can be seen in the 
coefficient ou^uts signals 610 and 612. There are two output signals 610 and 612 
with duration lUI that respect Eq.3 and produce the desired coefficient response 

20 . required for the FIR filter application. MG. 7 is a similar timing diagram 700 that 
is specific to the rest of the slices in the FIR filter, where latches are acting on 
decisions made by a previous stage in the FIR filter. The decision signals 708 and 
71*0 are delayed versions of the decision signals 704 and 706 firom the previous 
stage of the filter. The data signals 708 and 710 are used to drive a new set of 

25 coefficient signals 712. and 714. . . - 

The basic structure 500 can be used to .build a Feed-Forward transversal FIR filter 
. or a Decision Feedback FIR. filter. If the mux-mul coefficient outputs are summed 
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for every stage and used as a stand-alone response it is considered a Feed-Forward 
Equalizer (FEE) that would correspond to the TX_EQ 118. If the coefficient 
ou^uts for every stage are summed and fed back to the input of the filter it is 
considered a Decision Feedback Equalizer (DFE) that would correspond with the 
5 DFE 216. 

FIG. 8 outlines a DFE 124 that is configured to have six coefficients 806 using 
three stages 500, and whidh is used in the exemplary lOGb/s SerDes Receive path 
202. The output coefficient signals 806 are summed and fed back to the input of 

10 the first stage, where it is combined with the output of the HPF 206 at the 
summing node 208. The entire structure is synchronously clocked using the 
recovered 5Ghz clock 228. The data outputs 802 and 804 firom the last stage of the 
DFE form the data input 222 to the DEMUX 122. The DEMUX block 122 has one 
less stage due to the fact that the original serial bit stream was already de- 

15 multiplexed by a factor of two by the DFE, hence Data_even 802 and Data_odd 
804. 

FIG. 9 outlines a TX_EQ 118 that is configured to have four coefficients using 
two stages 500, and which is used in the exemplary lOGb/s SerDes Transmit path 

20 230. The output coefficient signals 906 are simimed at 908 and fed forward into 
the TX_IO driver 120. The entire structure is synchronously clocked with the 
locally generated clock 236. The input data 242 firom the MUX 116 requires a final 
stage of multiplexing. The final stage of the multiplexing is performed by the 
Mxix-Muls within the transversal filter, where D_even 902 and D_odd 904 are 

25 multiplexed into a serial response 910. 
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